Yet Another Presentation About Simulated Fishers

Ernesto

February 9, 2017

Introduction

Two directions

  • Control
  • Fitting

Common factor

What’s with simulated fishers ?

Control

Control

  • Open Loop
    • Scenario Evaluation
    • Policy Optimization
  • Closed Loop
    • Policy Search
    • Policy Discovery

Open Loop

Scenario Evaluation

  1. You have adaptive agents
  2. Somebody hands you a set of policies to test
  3. Apply each in turn
  4. Check which performs best

Tradeable vs Untradeable Quotas

  • We want to set a maximum fishing cap
  • 2 Options
    1. Fishery-wide Quota
    2. Individual Tradeable Quota
  • Which one is more “efficient” ?

Tradeable vs Untradeable Quotas - 2

Policy Optimization

  1. You have adaptive agents
  2. Somebody hands you a family of policies
  3. You want to find the “best” parameters

Policy Optimization

Best Quotas

  • The map is full of red and blue fish
  • Red fish live north, blue fish live south
  • We want to impose a separate quota for red and blue
  • We want to maximize landings of red fish and conservation of blue
  • What should the yearly quotas be?

Untradeable Quotas

Tradeable Quotas

Pareto Fronts

Closed Loop

Policy Search

Expensive vs Inexpensive Fish

  • The map is full of red and blue fish
  • Blue fish sells for 3 times the price of red fish

No intervention

PID Taxation

  • Expensive (blue) stock gets consumed too rapidly
  • You’d like to limit blue landings to 600 a day
  • You can’t set quotas
  • Taxes: the poor man’s quotas
  • Use a PI controller \[ p_{t+1} = a e_t + b \sum_{i=0}^T e_{i} \] \[ e_t = \text{Landings} - 600 \]

PID Taxation - demo

PID Taxation - optimal

Policy Discovery

  1. You have adaptive agents
  2. You have state indicators and action levers
  3. You have to figure out how to link the two together
  4. You want the decision rule to be optimal

Policy Discovery

Policy Discovery - How ?

  • Search a very general policy
  • Reinforcement Learning

Dynamic programming

  • Given state \(S_t\), you’d like to take actions \(a_t,a_{t+1},\dots\) to maximize \[ \sum_{i=0}^{\infty} R_{t+i}(S_{t+1},a_{t+1})\]
  • By Bellman equation we can solve this recursively \[ V(S_t) = \max_a T(S_t,S_{t+1},a_t) \left( R(S_t,S_{t+1},a_t) + \gamma V(S_{t+1}) \right) \]
  • Works for very simple mathematical models

Reinforcement Learning

  • We don’t have \(T(S,S',a)\) and \(S\) dimension is huge
  • However
    • We can use some indicators \(I\) to approximate \(S\)
    • Even better we can approximate \(V(S)\) as \(\bar V (I)\)
    • Even better we can approximate \(Q(S,a)\) as \(\bar Q(I,a)\)
    • We can play the game many times, start with a policy and observe what it does
    • Slowly modify the policy by targeting: \[ a = \arg \max Q(I,a) \]
  • Additive approximations \[ Q(I,a) = \alpha + \sum \beta_i \cos(c \pi I) \]

Biomass-based control

  • 1 species of fish
  • 300 Fishers

Random Controller

Bayesian Controller - Quota

Reinforcement Learning

  • Can’t set quotas
  • Can only open/close fishery each month
  • Biomass and time of the year our only indicators.
  • Train it 1000 episodes, \(\gamma = .999\)

20 years

80 years

Comparisons - 20 years

Method 20 Years 80 Years
Quota - optimized 20 years 412,056 390,581
Biomass controller 352,566 1,058,428
Random controller 398,069 390,678
Anarchy 230,225 202,231

Revenue-based controller

  • Perfect biomass monitoring is impossible
  • Can we create a controller looking only at average profits and distance from port (human dimensions)?
  • Train it 2000 episodes, \(\gamma = .999\)

80 years

Comparisons

Method 20 Years 80 Years
Quota - optimized 20 years 412,056 390,581
Biomass controller 352,566 1,058,428
Random controller 398,069 390,678
Anarchy 230,225 202,231
Cash-distance controller 326,116 1,001,269

Problems

  • Rough around the edges
  • Often does not converge
  • Opaque result, hard to describe \[ Q(I,a) = \alpha + \sum \beta_i \cos(c \pi I) \]

Fitting

Target Heatmap

Heatmap EEI

Fitting as minimization

  • You need to set ABM parameters \(\rho\)
  • You can generate fake logbooks as a function of \(\rho\)
  • Change \(\rho\) until the distance between real logbook and fake one is minimized.

Fitting as minimization

Area differences

Error minimization

Weaknesses of histogram matching

  • Weak to swaps
  • Not informative

Indirect inference

first step

Fitting problem

  • You have ABM with parameters \(\theta\)
  • You have real observations \(x_1,x_2,\dots,x_n\)
  • You’d like to fit by maximum likelihood \[\hat \theta = \max p(x_1,\dots,x_n|\theta)\]
  • Hopeless

Indirect inference

  • Use auxiliary model \(\beta(\cdot)\) to fit the data \[ \hat \beta = \beta (x_1,\dots,x_n) \]
  • Use ABM to generate synthetic data \[ x_1(\theta), \dots x_n(\theta) \]
  • Fit auxiliary model to synthetic data \[ \tilde \beta (\theta) = \beta ( x_1(\theta), \dots x_n(\theta)) \]
  • Minimize distance between the parameters of the auxiliary models \[ \theta = \min \left( \hat \beta - \tilde \beta(\theta) \right)^2 \]
  • Use logit regressions as our auxiliary model

Robustness of indirect inference

  • Minor assumptions on auxiliary model
    • Can be mispecified
    • Doesn’t have to predict very well
    • Needs to converge on \(\theta\)
    • Needs to be invertible
  • If \(\beta (\theta)\) differentiable, errors are gaussian
  • Distance over parameters is informative

Fitting as minimization - 2

  • You need to set ABM parameters \(\rho\)
  • You can generate fake logbooks as a function of \(\rho\)
  • Change \(\rho\) until the distance between the regression fits to real logbook and fake one is minimized.

Simple True Fit

Parameter \(\beta\) Standard Error
habit 2.427440699 0.019854316629
distance -0.007528047 0.0001339981

Model run

  • Agents use simple behavioural rule:
    • Explore new spot with probability \(\epsilon\)
    • Exploration range is size \(\delta\)
  • Fix \(\epsilon\) and \(\delta\) we can generate logbook data
  • We can find the habit and distance \(\beta\) for the logbook data
  • For what values of \(\epsilon\) and \(\delta\) the habit and distance \(\beta\) are closest to the true numbers?

Indirect inference - EEI

Indirect Inference Error

Parameter Original Moderate \(\epsilon\) High \(\epsilon\)
habit 2.427440699 2.4425309 2.42788360
distance -0.007528047 -0.0089882 -0.00943496

Model selection

  • Different behavioural rule:
    • Explore new spot with probability \(\epsilon\)
    • Discount memory of old spots with factor \(\alpha\)
  • Does this rule fit the data better?

Indirect Inference - \(\epsilon\)-greedy

Model Selection - Results